Geometry and convergence of natural policy gradient methods
نویسندگان
چکیده
Abstract We study the convergence of several natural policy gradient (NPG) methods in infinite-horizon discounted Markov decision processes with regular parametrizations. For a variety NPGs and reward functions we show that trajectories state-action space are solutions flows respect to Hessian geometries, based on which obtain global guarantees rates. In particular, linear for unregularized regularized NPG metrics proposed by Kakade Morimura co-authors observing these arise from geometries conditional entropy respectively. Further, sublinear rates arising other convex like log-barriers. Finally, interpret discrete-time rewards as inexact Newton if is defined geometry regularizer. This yields local quadratic step size equal inverse penalization strength.
منابع مشابه
Global Convergence of Policy Gradient Methods for Linearized Control Problems
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an “end-to-end” approach, directly optimizing the performance metric of interest 3) they inherently allow for richly parameterized policies. A notable drawback is th...
متن کاملGradient Convergence in Gradient Methods
For the classical gradient method xt+1 = xt − γt∇f(xt) and several deterministic and stochastic variants, we discuss the issue of convergence of the gradient sequence ∇f(xt) and the attendant issue of stationarity of limit points of xt. We assume that ∇f is Lipschitz continuous, and that the stepsize γt diminishes to 0 and satisfies standard stochastic approximation conditions. We show that eit...
متن کاملA Natural Policy Gradient
We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be...
متن کاملGradient Convergence in Gradient methods with Errors
We consider the gradient method xt+1 = xt + γt(st + wt), where st is a descent direction of a function f : �n → � and wt is a deterministic or stochastic error. We assume that ∇f is Lipschitz continuous, that the stepsize γt diminishes to 0, and that st and wt satisfy standard conditions. We show that either f(xt) → −∞ or f(xt) converges to a finite value and ∇f(xt) → 0 (with probability 1 in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information geometry
سال: 2023
ISSN: ['2511-2481', '2511-249X']
DOI: https://doi.org/10.1007/s41884-023-00106-z